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ABSTRACT 
The  general  two  population  discrimination  problem  is 
discussed  briefly  under  various  situations,,   Discrimination 
procedures  using  the  linear  discriminant  function  and  a 
nonparametric  procedure  due  to  Ju  L0  Hodges  and  Ee  Fix.  which 
classifies  a  random  variable  to  a  population  on  the  basis  of 
assigning  it  to  the  population  which  has  the  nearest  obser- 
vation to  an  observed  value  of  the  random  variable  are 
discussed  and  compared  by  computing  the  probabilities  of 
misclassifieation  for  both  procedures  when  the  two  popu- 
lations are  normal  with  equal  covariance  matrices e   Proba- 
bilities of  misclassifieation  are  computed  for  the 
nonparametric  discriminator  and  the  linear  discriminant 
function  for  two  small  sample  sizes  for  the  case  when  the 
two  populations  being  discriminated  are  exponential,,   In 
this  latter  case,  both  discrimination  procedures  are  shown 
to  give  high  probabilities  of  misclassifieation  for  certain 
values  of  the  parameters  of  the  distribution  being  discrimi- 
nated.  Regions  are  given  in  terms  of  the  parameters  of  the 
two  exponential  distributions  where  one  of  the  probabilities 
of  error  is  greater  than  0„5>o   A  more  complete  investigation 
for  larger  sample  sizes  is  recommended  for  the  linear  dis- 
criminant function  and  the  nonparametric  procedure  dis- 
cussed in  this  paper  for  the  case  when  the  two  populations 
being  discriminated  are  exponential e 
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SECTION  I 
INTRODUCTION 

The  two  population  discrimination  problem  may  be  summa- 
rized as  follows s   given  a  random  variable  Z  distributed 
over  some  p=>dimensional  space  according  to  a  distribution  F9 
or  according  to  a  distribution  G^  determine  on  the  basis  of 
an  observation^  say  z  of  Z9   which  of  the  two  distributions 
Z  haso 

When  P  and  G  are  completely  known5  the  solution  to  the 
problem  is  implicit  in  the  Neyman-Pearson  lemmau(l)   The 
discrimination  depends  on  the  ratio  f (z)  where  f  and  g  are 

gffT 

the  respective  density  functions  of  P  and  G«   The  rule  is 

as  follows  s 

If  f  ( z ) 

^7=4 >  C,  decide  in  favor  of  P 

If  f  ( z ) 

g(z) <  0y  decide  in  favor  of  G 

If  f(z) 

gT£j     -  0,    the  decision  is  arbitrary „ 

C  is  an  appropriate  positive  constant  chosen  on  the 
basis  of  consideration  relating  to  the  importance  of  the  two 
possible  errors : 

(i)   P  =  P  (Z  is  assigned  to  G  |  Z  came  from  P) 
(ii)   Pp  =  P  (Z  is  assigned  to  P  |  Z  came  from  G)c 

The  two  most  widely  advocated  choices  of  C  ares 

(a)  Take  C  =  1 

(b)  Choose  C  such  that  P  =  P  , 

1    2 

2 


This  procedure *  known  as  the  "likelihood  ratio  pro- 
cedure" is  known  to  have  optimum  properties  with  regard  to 
control  of  the  probability  of  misclassif icatione 

When  P  and  G  are  known  except  for  the  values  of  one  or 

more  parameters^,  the  procedure  used  is  much  the  same  as  that 

just  described,,   Under  the  assumption  that  P  and  G  are  known 

except  for  one  or  more  parameters  and  if  we  can  assume  that 

samples  are  available  say: 

X  J,X  5oo.5X  from  P 
1  Z    3      m 

Y .  Y  J  ,  o  •  e  9  Y  from  G 
12  3      n 

we  are  able  to  estimate  the  unknown  parameters^,  denoted  col- 
lectively by  ©o   By  some  estimation  procedure^  we  can  esti- 
mate  9  by  9  and  assume  that  F^and  G^  are  the  correct 
distribution  functions.   The  "likelihood  ratio  procedure" 
and  the  decision  rules  outlined  above  can  now  be  applied,, 

If  it  is  assumed  that  P  and  G  are  p-variate  normal 
distributions  having  the  same  (unknown)  covariance  matrix 
and  unknown  expectation  vectors^,  the  linear  discriminant 
function  is  a  good  example  of  this  procedure „  (2)      The  given 
samples  are  used  to  estimate  the  covariance  matrices  and  the 
expectation  vectors  and  the  "likelihood  ratio  procedure"  is 
used  under  the  assumption  that  the  estimated  parameters  are 
known  to  be  correct 0   It  is  known  that  under  the  normal  as- 
sumption for  P  and  G  and  the  homoscedastic  assumption  that 
the  linear  discriminant  function  is  an  optimal  procedure 0 


Although  this  procedure  seems  reasonable  when  the 
parametric  form  of  the  distributions  is  correct  or  the  as- 
sumed form  is  correct^  there  is  concern  about  the  validity 
of  this  procedure  when  the  linear  discriminant  function  is 
used  with  data  not  normal <,  or  if  normal a  with  unequal 
covariance  matrices 0   In  fact  in  the  normal  situation  when 
the  covariance  matrices  are  not  equal 9  a  quadratic  function 
can  be  shown  to  be  optimal 0   There  is  a  need  then  for  a 
reasonable  discrimination  procedure  whose  validity  does  not 
require  the  knowledge  implied  by  the  normality  assumption^, 
the  homoscedastic  assumption  or  any  assumption  about  the 
parametric  forme 

Several  classes  of  nonparametric  discrimination  pro- 
cedures were  proposed  in  (3)o   These  procedures  were  proven 
to  have  asymptotic  optimum  properties  for  large  samples 0 
In  {L\)8    some  of  these  nonparametric  procedures  were  investi- 
gated when  the  samples  were  small „   These  procedures  were 
compared  with  the  linear  discriminant  function  where  P  and 
G  were  assumed  normal  with  equal  covariance  matrices  since 
under  these  assumptions  the  linear  discriminant  function  is 
known  to  be  optimal „   A  comparison  was  made  by  comparing  the 
probabilities  of  misclassif ication  when  the  linear  discriminant 
function  was  used  against  the  probabilities  of  misclassif i - 
cation  when  the  nonparametric  procedures  were  usede   A 
survey  of  the  procedures  and  results  of  (ij.)  are  given  in 


Section  II  of  this  paper*, 

In.  Section  III  of  this  paper,  an  investigation  is  made 
of  the  performance  of  one  of  the  nonparametric  discriminators 
discussed  in  (  if)  and  of  the  performance  of  the  linear 
discriminant  function  when  P  and  G  are  not  normal  but,  in 
fact,  exponential  with  parameters  X  ajCi<^   U   respectively 0 
The  exponential  distribution  was  selected  because  of  the  role 
it  plays  in  the  field  of  life  testing,  and  other  applied 
problems o   It  is  shown  that  for  sample  sizes  of  1  and  2, 
that  both  the  nonparametric  discriminator  and  the  linear 
discriminant  function  give  very  poor  results  for  certain 
values  of  \    and  /i  0 

Detailed  conclusions  and  recommendations  made  on  the 
basis  of  the  results  attained  in  Sections  II  and  III  are 
contained  in  Section  IV  of  this  paper0 

Professors  R„  R0  Read  and  J0  R0  Borsting,  of  the  Ue  So 
Naval  Postgraduate  School,  have  generously  given  their  time 
to  provide  direction,  encouragement  and  valuable  advice  to 
the  author  in  the  writing  of  this  papere 


SECTION  II 
PERFORMANCE  OP  THE  LINEAR  DISCRIMINANT  FUNCTION 
AND  A  CLASS  OF  NONPARAMETRIC  DISCRIMINATORS 
WHEN  THE  TWO  POPULATIONS  BEING  DISCRIMINATED 
HAVE  NORMAL  DISTRIBUTIONS  WITH 
EQUAL  COVARIANCE  MATRICES 

Let  X^X^ooo^X  be  a  sample  from  a  p-variate  distri- 
12      m 

bution  F  and  let  Y, ,  Y„,...yY  be  a  sample  from  a  p-variate 

1   2      n 

distribution  G0   It  is  assumed  further  that  the  parametric 
forms  of  F  and  G  are  unknown 0   If  z  is  an  observation  of  a 
random  variable  Z  known  to  be  either  distributed  as  F  or  Ga 
how  is  it  decided  on  the  basis  of  z   to  which  population  Z 
belongs?   Define  a  distance  function  (in  p-dimensional  space) 
which  will  permit  a  ranking  of  the  m+n  observations  ac- 
cording to  their  "nearness"  to  z0      The  idea  of  the  discrimi- 
nation procedures  outlined  in  (3)  is  to  assign  Z  to  the 
population  which  has  the  most  observations  nearest  to  z0 
Specif icallyj  choose  an  odd  integer*,  k9  and  assume  for  sira° 
plicity  that  m,^=ns    then  Z  is  assigned  to  the  distribution 
from  which  came  the  majority  of  the  k  nearest  observations » 

In  (3)<>  it  was  shown  that  several  classes  of  these  non- 
parametric  discriminators  have  asymptotically  optimum  per- 
formance as  m->00  and  n-^00  at  the  same  rate0   By  optimum 
performance,,  it  is  meant  that  the  probabilities  of  misclassifi- 

cation  P  and  P  ,  as  defined  in  the  introduction*,  tend  to 

1      2 


the  theoretical  minimum  values  which  they  could  have  if  F 
and  G  were  completely  known „ 

The  asymptotic  properties  and  the  simplicity  of  ap- 
plying the  procedures  of  this  class  of  nonparametric  dis- 
criminators suggest  that  this  type  of  procedures  might  be  a 
reasonable  alternative  to  the  commonly  applied  linear  dis~ 
criminant  function 0   However^  to  propose  an  alternative  to 
the  the  linear  discriminant  function  solely  on  the  basis  of 
asymptotic  properties  and  ease  of  application  would  not  be 
entirely  reasonable  0   In  particular 9  the  small  sample  per- 
formance of  such  nonparametric  discriminators  needs  investi- 
gation to  ascertain  how  much  discrimination  power  is  lost 
when  F  and  G  are  known  to  be  normal  with  equal  covariance 
matrices  so  that  the  linear  discriminant  function  is  ap- 
propriate o   One  way  this  investigation  can  be  accomplished 
is  by  comparing  the  probabilities  of  misclassification  when 
the  linear  discriminant  function  is  used  with  the  corre- 
sponding probabilities  of  misclassification  when  the  non= 
parametric  discriminators  are  usedo   Such  an  investigation 
was  made  in  (ij.)0   The  remainder  of  Section  II  is  devoted  t© 
summarizing  the  procedures  and  results  of  (![. )« 

It  is  first  pointed  out  that  the  problem  can  be  reduced 
considerably  by  considering  linear  transformations  in  the 
observation  space 0   It  is  always  possible  by  such  transform 
mat ions  to  insure  that  F  and  G  will  have  the  identity 


covarianee  matrix „   In  other  words,,  in  the  new  space  the  p 

transformed  measurements  are  Independent  in  ia'-ion 

and  each  measurement  has  a  unit  varlan-:-.  V.    is  also  possi 
bl€  by  such  transformations  to  put  the  expe      n  vector  of 
the  P  population  at  the  origin  and  the  expectation  vector-  of 
the  G-  population  on  the  positive  firs4  axis.   This  allows 
complete  specification  of  the  transformed  population  by  the 
two  parameters  p  and  A  where 

A  =  E  (first  coordinate  of  Y) 

=  distance  between  the  means  of  the 
transformed  populations* 
In  performing  such  linear  transformations 9    P  and  P  for  the 
linear  discriminant  function  are  unchanged e   The  proba- 
bilities P  and  P  for  the  nonparametric  discriminators  are 
likewise  unchanged  since  such  linear  transformations  map 
the  totality  of  distance  functions  one-one  Into  the  totality 
In  the  new  space , 

It  Is  assumed  that  the  sizes  of  the  samples  taken  from 
each  population  are  equalv  m=n0   in  the  main,  the  distance 

function  used  is 

P 
A  U,z)  ~  Max  ix^zJ 
i=l       ' ' 

It  should  be  pointed  out  that  A  Is  just  one  of  a  large  class 

of  distance  functions ,  anyone  of  which  could  be  used0   This 

fact  is  mentioned  since  the  probabilities  P,  and  P  depend 

12 
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very  heavily  on  the  distance  function  chosen,,   Most  of  the 
computations  are  mad©  using  k-=l9  that  is5  assign  Z  to  the 
population  P  or  G  from  which  came  the  individual  of  the 
pooled  samples  which  most  closely  resembles  Z0 

The  first  case  considered  is  the  univariate  case,  p=l0 
Using  the  rule  of  the  "nearest  neighbor "5  that  is5  k=la  and 
the  distance  function  A  =  I  x-z|awhich  corresponds  to  ordi- 
nary Euclidean  distance  in  this  case5  the  probabilities  P 

1 

and  P  are  computed  for  various  values  of  n  and  As 

.For  p=l,  the  linear  discriminant  function  is  greatly 
reduced  since  no  matrix  computation  enters „   The  arithmetic 
raean  ^±2  °^  ^ie  saniPle  means  is  computed  and  Z  is  assigned 
to  that  population  whose  sample  mean  lies  on  the  side  of 

X+Y  as  does  Z  Itself e   The  probabilities  of  misclassifi- 

2 
cation  are  now  readily  computed. 

Prom  the  symmetry  of  the  problem,  P  SP  so  it  is  suf- 

1   2 

ficient  to  compute  P  ,  thuss  it  is  assumed  that  Z  is  distri- 

1 

buted  according  to  the  P  distribution,,  As   was  pointed  out 

previously^,  linear  transformations  make  it  possible  to  put 

2    2 
E(X)=0<,  E(Y)-A>0  and  (f  =  CC=1  with  no  loss  of  generality0 

A      x 

An  error  is  committed  by  the  linear  discriminant 
function  if  and  only  if, 

(i)   Z  >  X+Y  and  Y  >  a 
(ii)   Z  <  X+Y  and  Y  <   X. 
Define  U=Y~X  and   V=  X+Y~2Zo   It  is  easily  shown  that  U 
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very  heavily  on  the  distance  function  chosen.   Most  of  the 
computations  are  made  using  k=l9  that  is,  assign  Z  to  the 
population  P  or  G  from  which  came  the  individual  of  the 
pooled  samples  which  most  closely  resembles  Z0 

The  first  case  considered  is  the  univariate  case,  p=l0 
Using  the  rule  of  the  "nearest  neighbor "j  that  is,,  k=l,  and 
the  distance  function  A  =  I  x-z|, which  corresponds  to  ordi- 
nary Euclidean  distance  in  this  case,  the  probabilities  P 

1 

and  P  are  computed  for  various  values  of  n  andA.s 

.For  p=l,  the  linear .discriminant  function  is  greatly 

reduced  since  no  matrix  computation  enters „   The  arithmetic 

mean  X+Y  of  the  sample  means  is  computed  and  Z  is  assigned 

to  that  population  whose  sample  mean  lies  on  the  side  of 

.  X+Y  as  does  Z  itself e   The  probabilities  of  misclassifi- 
2 
cation  are  now  readily  computed. 

Prom  the  symmetry  of  the  problem,  P  =p  so  it  is  suf- 

1   2 

ficient  to  compute  P  ,  thus,  it  is  assumed  that  Z  is  distri- 

1 

buted  according  to  the  P  distribution0   £s  was  pointed  out 

previously,  linear  transformations  make  it  possible  to  put 

2    2 
E(X)=0,  E(Y)=A>0  and  (f  =   CL=1  with  no  loss  of  generality0 

X     x 

An  error  is  committed  by  the  linear  discriminant 
function  if  and  only  if, 

(i)   Z  >  X+Y  and  Y  >  X 
(ii)   Z  <  X+Y  and  Y  <   X. 
Define  U=Y~X  and   V=  X+Y-2Z,   It  is  easily  shown  that  U 
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and  V  are  independent  normal  random  variable?  with  E(U)-A.„ 

Oy  =  2/n,  S(V)^A;CT^  - :  i+  +  2/n0 

U  and  Vs    an   error   is   comraltted  by   tee  linear  ci  riinant 

function  if  and  only  If  UV<0„      Thus 

linear  discriminant   function  when  p=3 
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(j)(-  ~)9    it  is  observed  that  tee  maximum  proba= 

bility  of  misclassification  is  e5o   The  values  of  P  =P  for 

1  2 

various  values  of  n  andA  are  given  In  Table     Figures  1 
and  2   give  these  results  graphically e   All  Tables  and  Figures 
in  Section  II  have  been  reproduced  from  (l|)o 

We  consider  now  the  nonparametric  discriminator  using 
the  "rule  of  the  nearest  neighbor*,  n   k=l0  which  consists  of 
assigning  Z  to  that  population  from  which  came  the  sample 
individual  nearest  to  ze   Suppose  that  Z=z0   Let  P  (z) 
denote  the  conditional  probability  teat  the  nearest  of  the 
2n  sample  observations  to  z  is  a  y,,  given  Zaz«   Then, 


Px  =  E  (P1(Z)) 


oc 


■  CO 


iln 


e 


—  -s-Z. 
2 
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TABLE  1 

PROBABILITY  OP  ERROR,  LINEAR  DISCRIMINANT  FUNCTION 
UNIVARIATE  NORMAL  DISTRIBUTIONS 


n 

A  -i 

A  =2 

A  =3 

1 

okn$ 

o2532 

.1235 

2 

o3821 

01999 

o0910 

3 

.3611 

0l8l9 

0O826 

k 

o  31+7  2 

cl7i|4 

0O787 

5 

c3376 

ol707 

.0763 

10 

o3175 

0  X6I4.6 

.0716 

20 

o3H0 

.1616 

.0692 

50 

o309lj- 

.1599 

0O678 

00 

o3085 

.1587 

.0668 

n  =  size  of  sample  taken  from  each,  population 
A  =  distance  between  the  means  of  the  two  populations 
Probability  of  error  =  P  (Z  is  assigned  to  S  |  Z  came  from  P) 
=  P  (Z  is  assigned  to  P  I  Z  came  from  G) 
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FIGURE  1 
Probability  of  error  Pn  of  the  linear  discriminant 
function  for  two  univariate  normal  distributions  with 
distance  between  means  =  X   ° 
n  =  size  of  sample  from  each  population,, 
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FIGURE  2 
Probability  of  error  P,  of  fche  linear  discriminant 
function  for  two  univariate  normal  distributions  with 
distance  between  the  means  =  A.  »  plotted  as  a  function  of 

A  . 

n  =  size  of  sample  from  each  population 
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It  remains  then  to  calculate  P  (z)9   Define 
Hz((5)  =  P(|x  -  z|  <6  )      6    >  0 

=  p(z-(5<x<z+5) 
=  (|)(z  +6 )  -  (|>(z  -  (5  )  ? 

and       Kz(5)  =  P(|Y=z|<(5) 

=  p(z  -  A  «(5<y-A<  z  -  A  +  6  ) 
=  4>(z  -  a   +5  )  -  4>(*  -A  -  6  ). 

The  eventv  "the  nearest  sample  value  to  z  is  a  y"  can 
be  classified  into  the  n  exclusive  equiprobable  events p  "the 
nearest  sample  value  to  z  is  j.s    i=l,  2S  ooo*  n0   Since  the 
nearest  y  to  z   will  necessarily  be  the  minimum  ya  it  is 
necessary  to  compute  the  probability  density  function  for  the 

minimum  of  I  Y,  -z  I  s    I Y^-z  s  000«   I Y  -z  I  0   Since  the 

1  1  \    J    )    2      '  In   ' 

|  Yj -z  |  s    i  =  1S2S  0005,  ns    are  independent  identically  dis- 

tributed  random  variables^,  this  density  function  is  easily 

shown  to  bes 

n(l  -  Kz((5)  )n^ciKz  ((5). 

P- (z)  is  then  computed  by  the  following  formula: 

P(z)=n   /   (1  -  Hz((5))n(l  -  Kz((5))    dKz(6)  (2) 
Jo 

Formulae  (1)  and  (2)  form  the  basis  for  all  the  computations 
for  the  "nearest  neighbor  rule"  no  matter  what  the  value  of 
p  if  for  p  >  1  on©  replaces  pf|x~z|<(5)byP  (the  distance 
of  X  from  z  <  (5  )  and  similarly  P(  |Y  ~  z|  <  (5  )  by  P  (the 
distance  of  Y  from  z  <  (5  ) «   Of  course  the  specific  evalu- 
ations depend  upon  the  distance  function  used. 
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Except;  for  the  case  p=l,  n=lv  in  which  case  P  and  P 

1      2 
are  the  same  for  the  linear  discriminant  function  and  the 

nonparametric  discriminator,  the  bulk  of  the  computations 
for  the  nonparametric  discriminator  were  carried  out  by- 
straightforward  numerical  integration   These  computations 
are  given  in  Table  2»   These  computations  are  quite  heavy, 
especially  for  the  case  p=2o   Therefore,  a  search  for  an 

approximation  formula  for  the  computation  of  P  (z)  was 

1 

instituted,,   One  approximation  formula  was  found  which  gave 
very  good  results c   A  discussion  of  this  approximation 
formula  is  given  in  (i|),  P,  as  computed  using  the  approxi- 

JL 

mation  formula  for  P  (z)  is  tabled  in  Table  2-A0   One  very 
interesting  result  which  was  obtained  using  the  approxi- 

mation  formula  for  P  (z)  was  that  for  large  n, 

1 


oo 


P,    =     E 
1 


te(z) 

(flzi  +g(z7 

An  application  of  Schwartz 8s   inequality  shows   the  latter 


f (z)g(z)    dz 


integral  to  be  at  most  0o5o   It  is  thus  possible  to  assert 
that,  whatever  be  the  populations  being  discriminated,  the 
"rule  of  the  nearest  neighbor"  will  have  in  the  limit  as 
m  =  n-^QO  equal  probabilities  of  error  at  most  0,5e 

To  compare  the  figures  of  Tables  1,  2,  and  2- A,  the 
values  of  P  =  P  for  paired  values  of  \    are  plotted  against 
n  in  Figure  30   In  Figure  !;<,  the  same  values  are  plotted 
against  \    for  selected  values  of  n0 
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TABLE  2 
PROBABILITY  OF  ERROR,  NONPARAMETRIC  DISCRIMINATOR 
WITH  k=l,  UNIVARIATE  NORMAL  DISTRIBUTION 


n 

A  =i 

A  -2 

A -3 

.1 

o  ig.75 

«2532 

oX235 

2 

dj.086 

e236^ 

oiLOSlj. 

3 

oi|052 

o230? 

0IO36 

h 

oi|032 

O2280 

•  1014 

TABLE  2" A 
APPROXIMATE  PROBABILITY  OF  ERROR,  NONPARAMETRIC 
DISCRIMINATOR  WITH  k=l„  UNIVARIATE  NORMAL  DISTRIBUTION 


n 


A  =1  A-2  A  =3 


^  ol|03  o226  »I02 

5  oij-oi  o225  aoo 

10  o399  o223  .098 

20  .398  „22i|.  o098 

50  o398  2S  o098 

00  o398  o225  ,098 


n  -  size  of  sample  from   each  population 
A  ~  distance  between  the  means  of  the-  two  populations 
Probability  of  esror   =  P(2  is  assigned  to  G  |  Z  came  from  F) 
=  P(Z  is  assigned  to  P  |  Z  came  from  G) 
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FIGURE  3 

Comparison  of  the  probability  of  error   P     as   a  function  of 

1 

n  for  the  linear-  discriminant  function  and  the  nonparametric 
discriminator 0  distance  function  /\  „  k=l,  for  two  normal 
univariate  populations  with  distance  between  means  =  \    , 
n  =  size  of  sample  from  each  population 
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FIGURE  ij. 
Comparison  of  the  probability  of  error  P  as  a  function  of 
^  „    the  distance  between  the  raeansa  for  the  linear  dis- 
criminant function  and  the  nonparametric  discriminator^, 
distance  function  =  ^  s  k=ls  for  two  nonaal  univariate 
populations 

n  =  size  of  sample  from  each  population 
n  =  1  is  identical  for  both 
---  indicates  the  nonparametric  procedure 
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Not  discussed  in  this  paper9  but  investigated  to  a  very- 
limited  extent  in  (I4.)  are  the  following  cases: 

(i)   the  nonparametrie  discriminator  using  A   as  a 
distance  function  with  k  -  3  for  the  univariate 

and  bivariate  normal  distributions 
(ii)   the  nonparametric  discriminator  using  /\   as  a 
distance  function  k  =  lg   n  =  1  for  p  »  2 
(iii)   the  effect  of  distance  functions  other  than  /\  on 

the  probabilities  of  misclassification  for  bivariate 
normal  distribution 
Although  the  investigation  of  the  above  cases  was  ex- 
tremely limited  due  to  the  laborious  computations^,  the 
results  that  were  obtained  indicated  that  the  nonparametric 
discrimination  procedure  gave  "reasonable"  error  probabili- 
ties in  both  cases  (i)  and  (ii)0   In  the  bivariate  normal 
distribution^  different  distance  functions  produced  vastly 
different  error  probabilities  in  some  situations e 
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SECTION  III 
PERFORMANCE  OP  THE  LINEAR  DISCRIMINANT  FUNCTION 
AND  A  CLASS  OF  NONPARAMETRIC  DISCRIMINATORS 
WHEN  F  AND  G  ARE  EXPONENTIALLY  DISTRIBUTED 

In  this  section,,,  a  limited  investigation  of  the  linear 
discriminant  function  and  the  nonpar ametric  discriminator 
using  A  as  a  distance  function  and  using  "the  rule  of  the 
nearest  neighbor,,  !"  k=ls  is  made  when  F  and  G  are  not  normally 
distributed j  but  in  fact5  exponentially  distributed  with 
parameters  A   and  fJL     respectively 0   The  performance  of 
both  the  linear  discriminant  function  and  the  nonpar ametric 
discriminator  will  be  Investigated  again  by  computing  the 
probabilities  of  misclassification,,   Under  the  assumption 
that  F  and  G-  are  exponentially  distributed^,  it  will  be  shown 
that  the  linear  discriminant  function  and  the  nonparametric 
discriminator  using  A  as  a  distance  function  and  "the  rule 
of  the  nearest  neighbor"  can  give  high  probabilities  of 
misclassification0 

Throughout  the  remainder  of  the  section,,  it  will  be 
assumed  that  m  =  n  and  that  F  and  G  are  exponentially  dis- 
tributed with  parameters  X     an^-    /i  respectively.   Because 
of  the  heavy  computations  involved  in  computing  the  probabili- 
ties of  misclassification; 

(i)  P-,  =  P  (assigning  Z  to  G  |  Z  came  from  F) 
(ii)  ?p  ~    P  (assigning  Z  to  F  |  Z  came  from  G) 
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the  only  cases  Investigated  will  be  for  p=l  and  n=l32„ 

Pn  and  p  will  first  be  computed  for  the  linear  dis  = 

criminant  function,,   The  procedure  here  is  precisely  that 

which  was  used  in  Section  II  for  p  =  10   One  simply  computed 

the  arithmetic  mean  X  +  Y  of  the  sample  means  and  assigns 

Z  to  that  population  whose  sample  mean  lies  on  the  side  of 

X  +  L     as  does  z   itself e   While  P  /  P_<,  it  is  only  neces=> 
2  1     2' 

sary  to  compute  P..  since  Pp  can  readily  be  computed  from  P 
by  interchanging  X    Bn^L  II     • 

Proceeding  as  in  Section  lie,  define  the  new  variables 
U  =  Y  -  X  and  V  =  X  +  Y  =  2Z0   If  U  and  V  are  to  be  inde- 
pendent,, it  is  necessary  that  the  covariance  of  U  and  V  be 
zeroo   Computing  the  covariance  of  U  and  V  we  have : 

Cov(U,V)  =  -  j  1-  -  i.  ]   f   o  except  for  A  =  \1  . 
n  (A    M  ' 

Since  discrimination  Is  not  possible  for  \      =  jj.    9    the 
CovCUaV)  will  not  be  zero  and  m  general  U  and  V  will  not  be 
independent e   As  before,,  -an  error  is  committed  by  linear  dis- 
criminant function  If  and  only  if $ 

(1)  Z  >  JL±JL  and  Y  >  X 
(ii)  Z  <  X  +  Y   and  Y  <  X  . 

In  terms  of  the  variables  U  and  V9    an  error  is  committed  if 
and  only  If  UV  <  09  and  therefore, 

?1   =  P  (UV  <  0)o 
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Since  U  and  V  are  not  In  general  independent 9  the  probability 
that  UV<  0  is  not  easily  computed0   El  Is  necessary  to  com- 
pute the  joint  density  1     ion  for  U  and  V  and  integr 
over  the  region  where  UV  <  G„   The  joint  density  function 
of  U  and  V  was  computed  but  because  of  the  complex  nature  of 
this  function,  it  was  considered  easier  to  compute  P^  di= 
rectly0   By  (i)  and  (it)  and  the  definition  of  P  it  follows 
that  s 

Pn  =  P  (Z>X  +  YS  Y>X)  +  P  ( Z  <  X  +  Y0  Y<X). 

Let  T  =  nY  and  S  -  nX  and  th 

I   ,  is  the  gainma  density  function  with  parameters  n  and/i. 

f  is  the  gamma  density  function  with  parameters  n  and  A, 
Since  T,,  S„  and  Z  axe  independent  random  variables , 

Px=       f°°  (  ^f  ~  fz  (z)  fT  (*)  fg  (s)  dz   dt  ds 

°    3     "irT 


T+S 

J   j    J  ^  f2  (z)  fT  (t)  fs  (s)  dz  dt  ds. 

P  can  now  be  computed  by  direct  numerical  integrations   For 
1 

n=l,  P  as  a  function  of  A  and  /l  ±ss 

p     ( A 9 U )  -    Uiio  Az+  2  uz  +  15 A  U) 
1  3TM  +  2  A  M TT+TTT?" '  ji  + "  A~^ 

By  interchanging  A  an(^  jJ.  »    ?     ^Bs 

p2  (A,  M  )  =  p;1  (/I,  A  ) 
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Recognizing  that  the  numerator'  and  denominator  in  the  ex- 
pressions for  P  and  P  are  homogeneous  of  degree  3  in  A 

1  2 

and  IX  9    P  and  P  oan  be  expressed  in  terms  of  a  singl 


i. 


e 


parameter  o  by  setting  A  -  c  \±   „   Making  this  substitution 
in  the  expressions  for  P,  and  Pp  we  haves 

Pn(c)=    (10c2  +  15c  +  2) 

1     IirTT^TTi  +  c)(2TTI 

P2(C)=P1(|) 

For  n=l,  P,  and  P0  for  the  linear  discriminate  function  are 
the  same  as  P,  and  P  for  the  nonparametric  discriminator 
using  A  as  a  distance  function  and  "the  nearest  neighbor 
rules  k=l. " 

For  n=2j,  the  substitution  A  ~  c  \X      is  again  appropri- 
ate  and  P,  and  ?0   for  n=2  are  as  follows 9 

Px(c)  =  128c2(2c  +  3)    ,  (3c  +  1)  _  128 (kc   +   1) 
(c  +  kf    (3c  +  2)3    (c  +  if    2£(3c  +  2) 

Po  (°)  =  P-.fi 
2        1(^ 

Values  of  P^,  and  P  for  the  linear  discriminant  function 

1      2 

for  n~l  and  2  are  tabled  for  various  values  of  c  in  Table  3. 

P,  and  P  are  next  computed  for  the  nonparametric  dis= 

criminator  for  the  case  n~2c      The  procedure  used  is  exactly 

the  procedure  used,  in  Section  Il0   The  substitution  A=c/i> 

is  once  more  appropriate 0   P..  and  P  in  terms  of  a  single 

■L      2 

parameter  c  are  as  follows t 
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P  (c)  =  (30c2    )c        112)     (32  +  2Ue  ~  56c2  -  12c3) 
&(o^)to-D      3(3c  +2)(/  ~ 

___ 16c2  k        (112  -  52c  °  30c2) 

+  ___   _  +  TBc+?»  +  T5Tc  +  ^)Cc~^TT 
(c   -  l)(2c  +  1) 


+ 

M3c   +  8) 

3CJe"~+TT 

1                                                      for   c  ^  2 

(r+n 

P^o)  = 

(30c2  -   ^8c  - 
T3Tc"  +  2-JTc~: 

112)             (32  +  2lj.c    =  56c2  -   12c3) 

=  1)                                               ? 

3(3c   +  2)(c        -   1) 

4. 

(2c3   +  16c2   = 
(2c    +  l)(c2 

2c)       .               if           ,      (2k  ..  I4.C   -   10c2) 

1      (5c   +  2)     '       5(c   +  k)(o  -  1) 

-  1) 

— 

(3c   +  1) 

z 

(c    +   1) 

k(3e   +  2) 

3(3c   +1+)                                  f or  c  =  2 

V0)  =  pi(ll" 

Values  of  Pn  and  P^  for  various  values  of  c  for  the  non- 

1      2 

parametric  discriminator  with  n™2   are  given  in  Table  3* 

It  is  observed  In  Table  3s  that  P,  and  P  exceed  0e5 
for  numerous  values  of  c0   Because  of  this  observation,,  an 
investigation  was  made  to  determine  the  values  of  c  for 

which  P_  and  P  exceed  0o5o   Figure  5  displays  graphically 

i      2 

the  regions  in  the  A*  fJL   plane  where  P..  and  P  are  greater 
than  005o 

Figure  5y  points  out  only  too  well  that  great  caution 
should  be  U3ed  when  applying  the  linear  discriminant  in 
situations  when  the  populations  are  other  than  normal, 
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TABLE  3 
PROBABILITIES  OP  ERROR,  UNIVARIATE 
EXPONENTIAL  DISTRIBUTIONS 


c 

1 

2 

3 

r~ 

5 

10 

Linear 
Discriminant 

pl 

o5ooo 

o  1^.000 

o3262 

.27l|l 

o2360 

.1385 

Function-* 9    n=l 

P2 

c5ooo 

.5333 

052m. 

c5037 

cl|.870 

olj-329 

Linear 
Discriminant 

Pl 

o5ooo 

o3736 

o2652 

O2009 

.1567 

o0627 

Function,    n-2 

P2 

o5ooo 

o5299 

o50l|l 

oi|782 

0^577 

oi+056 

Nonpar ameirie 

Pl 

o^OOO 

olj.222 

«3559 

e3066 

02692 

d675 

Discriminator, 
n=2 

P2 

o^OOO 

o5003 

014.666 

.li-295 

.3706 

o328l 

— ^■■■.M».»-.^,^...,_. __ .. ,  ..^ 
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c  is  a  parameter  such  that  X    =  c/i. 

A   is  the  parameter  of  the  F  population 

11     is  the  parameter  of  the  G  population 

P_  =  P  (assigning  Z  to  G  |  Z  came  from  F) 

Pp  =  P  (assigning  Z  to  F  |  Z  came  from  G) 

n  =  sample  size 

-*For  n=l«,  the  probabilities  of  error  P..  and  p  for  the  linear 

X  d 

discriminant  function  are  equal  to  the  corresponding  probabili 
ties  of  error  P-  and  P  for  the  nonparametrie  discriminator0 


2k 


'c  =  H.2I22 


A 


C  =   1.0000 


X 


'  C  =  J.H80 


Linear  discriminant 
function*-.,  n  =  1 


X 


C  =  2. 00^6 


Linear  discriminant 
function9  n  =  2 


0000 


C  -O.  ^76 


Nonparametric  discriminator 
n  =  2 


FIGURE  5 

Values  of  A  and /i  for  which  P  and  P  exceed  0e£ 

c  =  parameter  such  that   A  =  c  /J. 

A  -  parameter  of  F  distribution 

pi  =  parameter  of  G  distribution 

P  =  P  (Z  is  assigned  to  G  |  Z  came  from  F) 
1 

P«  =  P  (Z  is  assigned  to  F  |  Z  came  from  G) 

2, 

n  =  sample  size 

-:c-Linear  discriminant  function  is  equivalent  to  the  non- 
parametric  discriminator  for  n  =  1„ 
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SECTION  IV 
SUMMARY  AND  CONCLUSIONS 

In  any  discrimination  problem  one  has  a  choice  between 
using  parametric  or  nonparametric  procedures 0   This  choice 
in  general  will  depend  upon  three  factors : 

(i)   the  strength  of  the  users  belief  in  his  parametric 
model o 
(ii)   the  loss  that  would  be  suffered  by  using  the  non- 
parametric  rule  if  in  fact  the  parametric  form  is 
correct0 
(iii)   the  loss  that  would  be  suffered  by  using  the 

parametric  rule  if  the  actual  densities  depart 
from  the  parametric  form  assumed0 
For  the  two  population  discrimination  problem^  Section 
II  of  this  paper  concerned  itself  with  (ii)0   In  Section  II9 
it  was  assumed  that  the  two  populations  being  discriminated 
were  normal  with  equal  covariance  matrices 0   For  the  univari- 
ate  cases  the  parametric  procedure  used  was  the  well  known 
linear  discriminant  function  which  is  known  to  be  optimal 
in  this  situation,,   The  nonparametric  procedure  used  was  the 
rule  whereby  a  random  variable  was  classified  as  belonging 
to  the  population  which  had  the  nearest  observation  to  an 
observed  value  of  the  random  variable  being  classified,,   A 
comparison  of  these  two  procedures  was  made  by  computing  and 
comparing  the  probabilities  of  misclassif ication9 
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Also  for  the  two  population  discrimination  problem^  an 
investigation  of  the  linear  discriminant  function  and  the 
same  nonparametric  procedure  was  carried  out  when  the  two 
populations  were  not  normal  but  exponential 0   Again  the  in- 
vestigation was  made  by  computing  the  probabilities  of  mis- 
classification  for  both  procedures „   This  investigation  was 
made  in  Section  III  of  this  paper0   Because  of  the  lengthy 
computations  involved  in  computing  the  probabilities  of  error 
for  both  of  these  procedures^  the  only  cases  considered  were 
the  univariate  case  for  sample  sizes  of  1  and  20   It  was 
shown  that  for  the  two  cases  investigated^,  sample  sizes  of 
1  and  2c,  that  both  the  procedures  could  give  poor  results 
depending  on  the  parameters  of  the  distributions 0 

In  conclusion^  it  seems  reasonable  that  if  the  popu- 
lations to  be  discriminated  are  well  known*,  and  have  been 
investigated  to  be  such  that  the  normal  distribution  gives 
a  good  fit  and  that  the  variance  and  correlation  do  not 
change  much  when  the  means  are  changed*,  and  if  the  classifi- 
cation to  be  made  warrants  the  labor  of  matrix  inversion*, 
then  the  linear  discriminant  function  should  be  usede   How<= 
everp  if  the  populations  are  either  not  well  knowni  or  are 
known  not  to  be  approximately  normal  or  to  have  very  differ- 
ent  covariance  matrices ;  or  if  the  discrimination  is  such 
that  small  decreases  in  probability  of  error  are  not  worth 
extensive  computations s    then  a  nonparametric  procedure  seems 

27 


to  be  advisable.   Which  nonparametric  procedure  is  a  matter 

of  choice  for  the  user, 

Recommendations  to  be  made  on  the  basis  of  this  paper 

are  i 

(i)   tabulate  the  probabilities  of  error  for  the  linear 
discriminant  function  in  representative  situations 
for  the  case  where  the  populations  being  discrimi- 
nated  are  multivariate  normal  with  equal  eovari- 
ance  matrices « 
(ii)   further  investigation  (for  larger  sample  sizes) 
of  the  linear  discriminant  function  in  the  case 
where  the  populations  being  discriminated  are  ex<= 
ponential  because  of  the  importance  of  the 
exponential  distribution  in  the  field  of  life 
testing  and  other  applied  problems e 
(iii)   investigation  as  to  the  effect  of  other  distance 

functions  for  the  nonparametric  discriminator  dis<= 
cussed  in  this  paper  in  the  case  when  the  popu- 
lations being  discriminated  are  exponential  or 
some  other  class  of  distributions 0 
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